Incremental Gradient Algorithms with Stepsizes Bounded Away from Zero

نویسنده

  • Mikhail V. Solodov
چکیده

We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain ε-approximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly well-suited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Incremental Gradient(-Projection) Method with Momentum Term and Adaptive Stepsize Rule

We consider an incremental gradient method with momentum term for minimizing the sum of continuously differentiable functions. This method uses a new adaptive stepsize rule that decreases the stepsize whenever sufficient progress is not made. We show that if the gradients of the functions are bounded and Lipschitz continuous over a certain level set, then every cluster point of the iterates gen...

متن کامل

Analysis of gradient descent methods with non-diminishing, bounded errors

Implementations of stochastic gradient search algorithms such as back propagation typically rely on finite difference (FD) approximation methods. These methods are used to approximate the objective function gradient in steepest descent algorithms as well as the gradient and Hessian inverse in Newton based schemes. The convergence analyses of such schemes critically require that perturbation par...

متن کامل

On Convergence of some Gradient-based Temporal-Differences Algorithms for Off-Policy Learning

We consider off-policy temporal-difference (TD) learning methods for policy evaluation in Markov decision processes with finite spaces and discounted reward criteria, and we present a collection of convergence results for several gradient-based TD algorithms with linear function approximation. The algorithms we analyze include: (i) two basic forms of two-time-scale gradient-based TD algorithms,...

متن کامل

Global convergence of the method of shortest residuals

The method of shortest residuals (SR) was presented by Hestenes and studied by Pitlak. If the function is quadratic, and if the line search is exact, then the SR method reduces to the linear conjugate gradient method. In this paper, we put forward the formulation of the SR method when the line search is inexact. We prove that, if stepsizes satisfy the strong Wolfe conditions, both the Fletcher-...

متن کامل

A Note on the Gradient Projection Method with Exact Stepsize Rule *1)

In this paper, we give some convergence results on the gradient projection method with exact stepsize rule for solving the minimization problem with convex constraints. Especially, we show that if the objective function is convex and its gradient is Lipschitz continuous, then the whole sequence of iterations produced by this method with bounded exact stepsizes converges to a solution of the con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Comp. Opt. and Appl.

دوره 11  شماره 

صفحات  -

تاریخ انتشار 1998